Overview of Stemming Algorithms for Indian and Non-Indian Languages

نویسندگان

  • Dalwadi Bijal
  • Suthar Sanket
چکیده

Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is to reduce different grammatical forms / word forms of a word like its noun, adjective, verb, adverb etc. to its root form. Stemming is widely uses in Information Retrieval system and reduces the size of index files. We can say that the goal of stemming is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. In this paper we have discussed different stemming algorithm for non-Indian and Indian language, methods of stemming, accuracy and errors. Keywords— Over-stemming, Under-stemming, Rule based stemming.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comprehensive Analyze of Stemming Algorithms for Indian and Non-indian Languages

Stemming is a technique used for reducing inflected words to their stem or root form. This is applicable for both the suffix as well as prefix. Stemming is a preprocessing step in text mining application and commonly used for Natural Language Processing (NLP). A stemmer can execute operation of altering morphologically identical words to root word without performing morphological analysis of th...

متن کامل

Literature Review: Stemming Algorithms for Indian and Non-Indian Languages

I. Introduction Stemming plays an important role in Information Retrieval System (IRS) for improving the performance of all languages. The goal of stemming is to diminish inflectional and derivational variant forms of a word to a common base form. A stemmer can execute operation of transforming morphologically identical words to root word without performing morphological analysis of that term. ...

متن کامل

A Literature Review: Stemming Algorithms for Indian Languages

Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP). The stemming problem has addressed in many contexts and by researchers in many disciplines. This expository paper presents survey of some of the latest developments on stemming algorithms in data mining and also presents wit...

متن کامل

Statistical Investigation and Comparative Assessment of the Non-Performing Assets of Indian Commercial Banks

Non-performing assets (NPAs) have been a major cause of concern for Indian commercial banks in the recent past years. Many studies have been reported on the different aspects of NPAs in Indian banking system. However, there is a crucial lack of investigation on the comparative assessment of various types of banks such as Public sector banks, Private sector banks and foreign banks so that the tr...

متن کامل

Shear Waves Through Non Planar Interface Between Anisotropic Inhomogeneous and Visco-Elastic Half-Spaces

A problem of reflection and transmission of a plane shear wave incident at a corrugated interface between transversely isotropic inhomogeneous and visco-elastic half-spaces is investigated. Applying appropriate boundary conditions and using Rayleigh’s method of approximation expressions for reflection and transmission coefficients are obtained for the first and second order approximation of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1404.2878  شماره 

صفحات  -

تاریخ انتشار 2014